Recovering Lagging Replicas in a Fault Tolerant System
نویسندگان
چکیده
In this paper, we discuss an often-ignored, but very important issue, i.e., how to recover slow replicas quickly in a fault tolerant system. Despite the fact that the replicas are deployed in identicallyequipped computing nodes, under heavy load, some replicas would lag behind due to various reasons. Quickly recovering slow replicas is important because not doing so could result in reduced throughput, high jitters in end-to-end latency, and reduced replication degree.
منابع مشابه
Safety-Reliability of Distributed Embedded System Fault Tolerant Units
In this paper we compare the relative performance of two fault tolerant mechanisms dealing with repairable and non-repairable components that have failed. The relative improvement in the reliability and safety of a system with repairable components is calculated with respect to the corresponding system where the components are not repairable. The fault tolerant systems under study correspond to...
متن کاملPermanent Web Publishing
LOCKSS (Lots Of Copies Keep Stuff Safe) is a prototype of a system to preserve access to scientific journals published on the Web. It is a majority-voting fault-tolerant system that, unlike normal systems, has far more replicas than would be required just to survive the anticipated failures. We are exploring techniques that exploit the surplus of replicas to permit a much looser form of coordin...
متن کاملImproving Recovery in Weak-Voting Data Replication
Nowadays eager update everywhere replication protocols are widely proposed for replicated databases. They work together with recovery protocols in order to provide highly available and fault-tolerant information systems. This paper proposes two enhancements for reducing the recovery times, minimizing the recovery information to transfer. The idea is to consider on one hand a more realistic fail...
متن کاملDependability of Distributed Control System Fault Tolerant Units
We investigate two types of fault tolerant units (FTU’s) suitable for dependable distributed control systems and numerically evaluate their reliability and mean time to failure (MTTF). A simple simulation-based methodology to numerically evaluate dependability functions of a wide variety of fault tolerant units is presented. The method is based on simulation of stochastic Petri Nets. A set of 1...
متن کاملA Microprocessor-Based Hybrid Duplex Fault-Tolerant System
Reliability is one of the fundamental considerations in the design of industrial control equipment. The microprocessor-based Hybrid Duplex fault-tolerant System (HDS) proposed in this paper has high reliability to meet this demand although its hardware structure is simple. The hardware configuration of HDS and the fault tolerance of this system are described. The switching control strategies in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010